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ABSTRACT 


Grammatical description of the sentence generation for the particular language 
is usually split into special morphological and Syntactical modules applied 
autonomously: the variance of morphological forms posed into the enumeration of 
constructional augmentations produces the enormous list of possible expositions of 
structural complexity paying no attention to the statistical plausibility of a 
construction in question. The usual method of reducing the complexity score of the 
Syntactical construction is to put morphological block inside the Syntactical one thus 
determining item structures strictly possible for this entity, to take into consideration 
the preferences of item occurrences. 


The Russian prepositional constructions are the clear case of exuberant 
variability of the structural complexity in case we are to interpret the meaning of the 
govenee nouns, its syntactical semantics and the governor element — some full word 
in a sentence or a predicative or nominal centre. In Russian the ambiguity of 
interpretation of a prepositional construction is formed by several meanings of 
primary prepositions plus several noun forms with different senses combined with the 
preposition plus possible difference of semantic classes implied by the govenee 
nouns. 


We construct an ontology for Russian prepositional constructions based on the 
corpus statistics and propose a sample from the grammatical module aimed at the 
analysis of the above mentioned structural variables. 
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INTRODUCTION 


We present in this paper the prototype variant of the grammar description of the 
Russian prepositional constructions. Text grammatical analyses is usually split into 
special morphological and Syntactical modules, though the autonomous grammatical 
interpretation produces the combination of errors on different levels. The semantic 
module may be helpful in disambiguation of grammatical constructions as well as 
lexical choice that is the consequence of lemmatization procedure, however, it is not 
clear in which terms to describe the semantics of the sentence and the text as a 
whole. We will use a AGFL formalism from the group of the affix generative 
grammars [1] which allows to insert hierarchy of categories and their values. They 


161 


NORDSCI Conference 


may be syntactical, morpho-syntactical, and semantico-syntactical. In this paper we 
concentrate our efforts to show in what manner to introduce the interpretation of the 
latter type for Russian prepositional constructions. 


The identification of semantic units and their relations to each other is an 
essential part of the automatic text analysis, though the recent indulgence to linguistic 
processing from neural network methods in natural language processing turned out to 
be a deadlock in near future. The real effectiveness will be in applying tactics of 
neural networks to the strategy of semantico-syntactical analysis which is 
indispensable if we want to extract information or content from the text. 


Prepositional constructions are the crucial part of syntactical and semantico- 
syntactical automatic text analysis. The first problem is the prepositional phrase (PP) 
attachment, the second — interpretation of relations between the governor word for PP 
attachment and nouns or pronouns in the prepositional construction (governees). 


Prepositions in the Russian language for quite a long time remained without the 
scrutiny of specialists in automatic text analysis. In information retrieval systems, 
they were included in "stop words" lists, which prevented their use in search models 
of information retrieval. Indeed, from the point of view of information retrieval, they, 
as a rule, can be neglected, because of their “low” nominativity, that is, they are not 
semantic identifiers of the document content. However, for semantically oriented 
analysis of the text, they are certainly important, since they convey certain semantic- 
syntactical relations between content words, clarify characteristics of a predicate, 
space-time specifications of propositions, etc. 


CORE GROUP OF RUSSIAN PRIMARY PREPOSITIONS 


In [2] in these proceedings we presented the core group of Russian primary 
prepositions which we use in order to illustrate our method of prepositional ontology 
construction and its use in the affix generative grammar [1]. They are as follows: “B” 
(‘in’), “Ha” (‘on’), “c” (‘with’), “10” (‘by’), “Kk” (‘to’), “a3” (‘from’), “y” (‘at’), “3a” 
(‘behind’), “or” (‘from’), “o” (‘about’). The preposition “B” (‘in’) is the most 
frequent. We see that according to statistics given in [3] any preposition tends to vary 
its frequency according to the stylistic and thematic corpus balance, though “B” (‘in’) 
has never moved from the highest rank. This gives us a clue that the distribution of 
semantic prepositional groups in corpus contexts may be the outline of grammatical 
oppositions presented in the semantic continuum of prepositional constructions. 


We described in [2] that prepositional ontology has hierarchical structure. The 
most abstract concepts are semantic rubrics, which are realized by means of 
syntaxemes — the minimal Syntactical morphological prepostional constructions with 
particular meanings. Syntaxemes are further detailed into subtypes, which convey 
lexico-grammatical meanings and may be expressed with secondary prepositions in 
a variety of textual forms. Notions from the two topmost levels of ontology are of 
grammatical nature, that require the special approach. In [2] we posit a quantative 
interpretation of Jakobson’s idea of the indicative categories [4] that some 
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approximation to the prepositional frequency ratio “1.5 to 1” may be interpreted as a 
manifestation that lesser member of this pair has some grammatical markedness. 


As a matter of fact we do not know the exact number and nature of semantic 
rubrics. The same indeterminacy exists in relation to the number and realization of 
syntaxemes. The syntaxemes mentioned in [5] may be looked as a basis for this list, 
but they were described in the frame of the functional approach without reference to 
the corpus statistics and generative grammar perspective. The number of 
prepositional senses, for example, “B” (‘in’) in the Russian explanatory dictionary [6], 
is enormous, and they have the same hindrance: there is no statistical assessment of 
all variants, and their granularity level is an issue of lexicographical principles. 


We take the list of semantic roles as a suggestion for semantic rubrics: for 
example, relative frequencies for semantic roles from the annotated English corpus 
Penn TreeBank [7]: subject (.35), temporal (.113); locative (.075); direction (.026); 
manner (.021); purpose (.017); extent (.010). 


We look for the abstract distribution of senses for the most frequent preposition 
“B” (‘in’) in the random sample of corpus contexts in the manner which may align 
presented frequencies in consent with proportions for indicative categories. We 
distinguish the following rubric’s frequencies on the basis of our balanced corpus: 
localization 8090 IPM (instances per million corpus tokens); temporative 5090 IPM; 
objective 3240 IPM; derivative, that is, secondary prepositions and phrasal 
expressions 2080 IPM; qualificative 1160 IPM; partitive 690 IPM; quantificative 430 
IPM. As corpus frequencies may vary in correlation with stylistic and thematic corpus 
balance, the proportional numerals may be more informative: localization (.35); 
temporative (.22); objective (.14); derivative (.09); qualificative (.05); partitive (.03); 
quantificative (.02). The diagram of semantic rubric distribution for preposition “B” 
(‘in’) is shown in Fig. 1 below. 


We are to clear some points. Firstly, rather small portion of contexts expressing 
objective concepts is explained by the fact that they are conveyed by means of case 
forms: the nominative renders a subject, the accusative — an object, the dative — an 
addressee, the ablative — an instrument, though interpretation of these case forms are 
not thus straightforward, but this issue is beyond the boundaries of this paper. 
Secondly, a small portion of propositional constructions realize syncretically several 
rubrics. For example, zeacamb 6 HecKomeKux mMempax (‘to lie a few meters away’) — 
localization plus quantificative; nonacmb 6 uyocue pyxu (‘to fall into the wrong 
hands’) — localization + objective + set phrase. 

Syntaxemes of the localization rubric include proper locative, expressed by “B” 
(‘in’) plus the locative case form [3700 IPM]: cudemb 6 caody (‘to sit in the garden’), 
2yiamb 6 ecy (‘to have a walk in the forest’), the same meaning may be expressed 
by the preposition “Ha” (‘on’) with the locative case [1800 IPM] as well: cudemb na 
cmy.e (‘to sit on the chair’), Oo1uamb 6o30yxom Ha eepanoe (‘to breathe air on the 
veranda’). In [8] the difference was connected with idea of “inclusion” for the former 
in the contrast “support” and “contiguity” for the latter. We point out that this 
“classification” is purely linguistic because a veranda is three-dimensional object and 
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a sitting person is inside it. Moreover, both syntaxemes may be used: cudemb 6 
Kpecie Ha 6epanoe (‘to sit ina chair on the veranda’), eucemb 6 6usbApOHOU Ha cmeHe 
(‘to hang in the pool room on the wall’). This fact is usually taken as an evidence that 
they have different roles, and we see that places of localization are included into each 
other, but which one into which? So we will consider the first variant to be the 
locativel, and the second — the locative2 because it concedes in frequency. We see 
the same parallelism in the directive syntaxeme denoting the end point of the travel 
trajectory: “B” (‘in’) plus the accusative case form [3700 IPM]: npuitimu 6 cao (‘to 
come to the garden’), nozoxcumb 6 wkad (‘to put in the closet’), and “Ha” (‘on’) with 
the accusative case [1570 IPM]: nocmaegumb na cmoa (‘to put on the table’), nputimu 
Ha eepanoy (‘to come to the veranda’). The sequence of the directive and locative has 
a standard interpretation: the locative in postposition is an attribute for the directive: 
npuexamb 6 zopod na Heee (‘to come to the city on the Neva River’), npuexamb Ha 
eunty 6 Mexcuxe (‘to come to the villa in Mexico’). The sequence of directives is as 
ambivalent as that of locatives: omee3mu 6 Oepeeuio na euty (‘to take someone to 
the village to the villa’), omnpaeumeca ua dauy 6 bapeuxy (‘to go to the cottage in 
Barvikha’). 


The temporative rubric concedes in frequency to the localization rubric that is 
usually interpreted as “time is space” [7], that is, temporal syntaxemes are structured 
on the localization model. The concept of “time” may be expressed as the deictic 
category referring to the time of a speech, that is usual for verbal predicates, this is 
an “absolute” time characteristic. If the referential point differs from the moment of 
a speech act, it is a “relative” time characteristic [4]. Due to the time metaphor we 
can see “imagined movement” on the time scale presenting the continuum of our 
experience in which events pass from the future through the present to the past. The 
complete isomorphism is impossible but the opposition of locative and accusative 
case forms in temporative construction reminds the selectional rules for included and 
supported object, though they are more simple: the temporative syntaxeme with a 
locative case form [2780 IPM] is used for nouns denoting months, years, longer 
periods such as a century or an epoch: 6 namom 2ooy (‘in the fifth year’), ¢ oxmabpe 
(‘in October’), ¢ 19 eexe (‘in the 19th century’), 6 Heoaume (‘in the Neolithic’). 
Temporative syntaxeme with an accusative case form [2300 IPM] is applied to 
abstract nouns denoting time or to quantified expressions of hours of day and night, 
days of the week: 60 gpema 3umoexu (‘during wintering’), 6 nepuod Hepecma (‘during 
spawning ’), 6 namb 4acoe ympa (‘at five in the morning’), 6 namuuyy (‘on Friday’). 
The sequence with a temporative with the locative case is impossible, the member in 
the postposition is expressed by a genitive case form: 6 oxmsadpe 1995 200a (‘in 
October 1995’), though accusative temporatives may be concatenated: 6 eocxpecenbe 
6 0e6Amb 4uacoe ympa (‘on Sunday at nine o'clock in the morning’). 


The temporative with the preposition “Ha” (‘on’) used with an accusative case 
with a frequency [930 IPM] comparable with that of grammatical constructions, it is 
a quantified temporal period (it is syncretic with the quantificative syntaxeme below): 
Ha 5 oneu (‘for 5 days’), na 10 eexoe (‘for 10 centuries’) or so-called bound 
constructions with main nouns denoting time periods with some attribute, usually 
adjectival: na 62ustcauwee Oecamunemue (‘for the next decade’), Ha OanHoe epema 
(‘at this time’). Temporal constructions for the preposition “Ha” (‘on’) with locative 


164 


Section LANGUAGE AND LINGUISTICS 


case form are quite a few, it’s better to regard them as set phrases: na 6ydyiyetl Hedene 
(‘next week’), Ha OanHom 3mane (‘at this stage’). 


The objective prepositional rubric includes various types of objects such as an 
object of action, object of thought and nomination, an addressee or participant, etc., 
further specification is possible on the subsyntaxeme level. This rubric is “marked” 
according to the frequency of corpus realization in the contrast with the previous one. 
It’s not a common view but this sense domain is structured on the model of 
localization [9]. There are as well the parallel syntaxemes for the prepositions “B” 
(‘in’), “Ha” (‘on’) with a locative case form (2300 IPM and 1120 IPM 
correspondingly) and an accusative case form (930 IPM and 2000 IPM). The former 
objective syntaxeme is more active for “B” (‘in’) than for “Ha” (‘on’). There are 
several types of objects characteristic for the first preposition: an object of perception: 
eudemb 6 cmapoix duavomax (‘to see in old movies’), an object of application: 
uCcnOJIb306aMb 6 mexHuKe (‘to use in engineering’), the object linked to the abstract 
noun replacing the direct object of the verb: uzvenenua 6 ananu3ax Kposeu (‘changes 
in blood tests’). As for the second preposition “Ha” (‘on’) there is some vehicle: 
examb Ha genocuneode (‘to ride a bike’) or device: cnycmumoca eHu3 Ha BepeeKe (‘to 
come down on the rope’). G.Zolotova’s [5] proposed for this construction an 
instrumentive or mediative syntaxeme, thogh there is a wide range of object types: 
uepamb Ha 2umape (‘to play guitar’), Hazbieamoca Ha uepume (‘to be called in 
Hebrew’), eoicmasumb KanOudamypy na 6oibopax (‘to run for election’). 


The objective syntaxeme with an accusative case form is more active for the 
preposition “Ha” (‘on’). There are used verbs of communication: omeeuamb Ha 
eonpoc (‘to answer the question’), emotional verbs: o6uacamoca na enacmb (‘to take 
offense at the authority’), metaphorically shifted travel verbs: s3acmynumb ua gaxmy 
(‘to stand on watch’), nowmu na eawu ycioeua (‘to go to your terms’). This 
syntaxeme for the preposition “B” (‘in’) is one of the most infrequent, it is attached to 
the verbs of transfiguration: npespamumoca 6 zycmyro maccy (‘to turn into a thick 
mass’), npeepamumb ocu3sHb 6 meamp (‘to turn life into theater’) or sometimes 
social: eoi6upamb 6 opeani eiacmu (‘to elect to the authorities’), wasvauumb Ha 
OonncHocmb pyKogooumens (‘to appoint a manager’). 


The next syntaxeme rubric — the derivative — is a heterogeneous group 
incorporating secondary prepositions and set phrases comprising the primary 
prepositions as a component of their structure, secondaries with the pronominal 
specification transforming into adverbial constructions. In this rubric the division into 
constructions with locative or accusative case forms is not thus important, so we take 
them as a whole. The examples for the prepositions “B” (‘in’) [2000 IPM] are: 6 eude 
maoazemxu (‘in pill form’), 6 o62acmu Hayxu (in the field of science), nposeumb ce6a 
6 dene (‘to prove oneself in business’), wwemb 6 eudy (‘to keep in mind’), 6 3HaK 
OnazodapHocmu (‘in gratitude’), @ nomo3y pekiamodamena (in favor of the 
advertiser),1pueodumb ce6a 6 nopsdok (‘to trim oneself up’), 6pocamoca 6 2naza (‘to 
strike the eye’). The preposition “Ha” (‘on’) is more active in this rubric [200 IPM] 
collating with its total frequency: weeenumeca na eempy (‘to stir in the wind’), 
oka3ambca y ecex Ha eudy (‘to be in public view’), npunamb xa bopm cyona (‘to take 
aboard ship’). 
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The three infrequent rubrics — the qualificative, the partitive, the quantificative — 
have indiscernible frequency portion comparable the statistical error. They are 
syncretic with other syntaxemes, so some examples are given above. In the Fig. | we 
show diagrams showing proportion of corpus realization of semantic rubrics for the 


topmost preposition “B” (‘in’) as a whole and that of described syntaxemes for “B 
(‘in’) and “Ha” (‘on’). 


2000 ms 
era 


Loc Dir TemplocTempace Objloc Objacc Der Qual Part Quan 


Fig. 1. Corpus frequency proportions of the semantic rubrics for the “e” (‘in’) 
(there are diminishing progression of proposed semantic rubrics on the left diagram, 
these are localization, temporative, objective, derivative, qualificative, partitive, 
quantificative and corpus frequency proportions of inserted syntaxemes for “e” (‘in’) 
and “na” (‘on’) (the right chart). 


To understand the correct model for organizing syntaxemes we are to understand 
what information is necessary to specify on the syntaxeme’s or syntaxeme type’s 
level. For example, we use syntaxemes in the generative grammar formalism. The 
first problem is the main word (or dummy predicate) to which the syntaxeme analyzed 
is attached. A well-known example of ambiguity of PP attachment in English “I saw 
a man with a teleskope” in Russian is rendered unequivocally: 1 eude1 yven06exa c 
mejneckonom versus A ude uenogeka 6 menecxon, Naturally, there are ambiguous 
cases: 631Mmb mempadb 6 Kiemxy (‘to take a squared notebook’ versus ‘to take a 
notebook into the cage’) where the latter variant is hardly come to someone’s mind. 
There is syntactical device of so-called redistribution of Syntactical links: when the 
prepositional construction is lineally divided from its governor verb by an object, the 
Syntactical link between a verb and PP is lessened and a link between a noun and PP 
becomes possible. This device gives a chance to appearance of nominal constructions 
with PP such as kapmuna ua cmene (‘a picture on the wall’). We stated above that for 
the goal of systematic analysis we interpret the governors of these constructions as 
dummy verbs. 


CONCLUSION AND FUTURE WORK 


The paper provides a construction grammar perspective to identifying meaning 
of prepositions in Russian. In order to solve natural language processing tasks, we 
need to learn how to uncover semantic relations in texts, especially in Russian a great 
number of them are conveyed by prepositions. 


166 


Section LANGUAGE AND LINGUISTICS 


We are collecting a serias of prepositional constructions and arrange them 
according to frequencies of specified meanings in corpora of modern Russian texts. 
Different semantic aspects of prepositional constructions are described with semantic 
rubrics which are based on a notion of syntaxeme proposed by G. Zolotova. Our final 
goal is to create a corpus-based quantitative ontology of Russian prepositions. 


The semantic rubrics presented in our approach help to organize rather vague 
prepositional meanings. Their affinity and difference may be explicated through the 
overlap of semantic classes of governing and subordinate words. The whole structure 
of prepositional frequencies that has not investigated so far and arrangement of 
semantic units expressed in text contexts are resources for the compilation of the 
quantitative prepositional grammar for Russian. 


We are going to compile the first version of essential semantic rubrics to proceed 
in the outlined direction and to grasp the sense distribution for primary prepositions. 
Then we will assign the secondary prepositions to these sets. Thus we will check the 
initial hypothesis that the granularity of prepositional meanings are restricted by the 
meaningful diversity of secondary prepositions. 


Further stages of our project include: 


to clarify the set of syntaxemes for prepositional constructions referring to 
governors and governees semantic types on the base of corpus data; 


to compile sets of prepositional constructions from corpora of different genres in 
order to discover the significant variation of statistical parameters; 


to describe prepositional constructions in terms of predominant semantic classes 
and/or lexemes used as “governors”; 


to list predominant semantic classes and/or lexemes used as “governees” for 
different semantic rubrics and/or syntaxemes; 


to create a database of Russian prepositional constructions accumulating corpus 
material with statistical information obtained; 


to compile rules of the hybrid generative grammar showing the use of 
prepositional phrases for expressing the comprehensive set of syntaxemes. 
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